TCGA 您所在的位置:网站首页 tcga portal TCGA

TCGA

#TCGA | 来源: 网络整理| 查看: 265

TMP_manifest_dataTXT, 5.18 KB

Posted in: TCGA Why are there fewer open access TCGA mutations in DR 32 (GENCODE Update Release)?

The primary reasons for the fewer open-access mutations are from two strategies that improve quality: 1) TCGA is now using a 2-caller ensemble, instead of a single caller; 2) Removal of variants outside of the target capture region, instead of a combined “target capture + GAF exonic region”. Additionally, TCGA was the original project in which GDC open-access variants were produced and used variant rescue steps that only applied to TCGA. To keep the TCGA variant-calling pipeline consistent across projects, GDC is no longer rescuing MC3 and TCGA validation variants.

Where can I find clinical data elements specific to my cancer research of interest?

The GDC supports the submission of clinical and biospecimen supplements. Supplemental files can be downloaded from the GDC by searching for the Data Type "Clinical Supplement" or "Biospecimen Supplement" from the facet search in the GDC Data Portal Repository. For TCGA data, the supplement data is provided in XML documents and tab delimited files (biotabs). These files, in varying degrees, provide information on marker status (e.g. EBV status), treatment regimen, slide magnification, histology distinctions, and staging questions.

What is the difference between tissue "collection" and tissue "procurement" in TCGA data?

TCGA “collection” represents the collection of the sample for TCGA, whereas “procurement” represents the removal of tissue from the patient.

Why is the data maintained in cBioPortal, Broad Firehose, or the Seven Bridges Cancer Genomics Cloud different from the GDC data?

The GDC harmonizes data across projects. This includes aligning the genomic data to a common reference genome (HG38) and generating higher level data using GDC bioinformatics pipelines. Other repositories may process the data differently. For example, TCGA data in cBioPortal uses the original mutation data generated by the individual TCGA sequencing centers. The source of the data is the Broad Firehose (or the publication pages for data that matches a specific manuscript). These data are usually a combination of two mutation callers, but they differ by center (typically a variant caller like MuTect plus an indel caller), and sequencing centers have modified their mutation calling pipelines over time. TCGA data in the GDC is harmonized with the latest reference genome (GRCh38). Mutations are called using four variant callers: MuTect, VarScan2, MuSE, and Pindel.

How do I access data from TCGA marker or other landmark cancer genomics papers?

The TCGA marker and other landmark cancer genomics papers, as well as associated supplemental files, are available on the GDC Publication Pages. The Publication Pages provide access to publication information and supplementary files.

Why does the treatment data appear to be incomplete and what treatment data is available in the GDC?

Submitting treatment data is optional as not all projects are associated with treatment studies. For TCGA projects, for example, not all projects and cases have treatment data. For TCGA projects with treatment data, information is available in applicable clinical supplement files (i.e. clinical XML, biotabs). For other project associated with treatment studies in which the treatment data has been submitted to the GDC, treatment data is available for download in JSON and TSV format. These studies may also contain clinical supplement files.

Does the GDC provide access to follow-up (i.e. longitudinal) data?

The availability of follow-up data is specific to the project and associated study. For the Multiple Myeloma Research Foundation (MMRF) Clinical Outcomes in Multiple Myeloma to Personal Assessment of Genetic Profile (CoMMpass) study, longitudinal information was generated to track patients over the course of their disease. This data is available in the GDC by viewing the clinical follow-up data available for download on each case page in the GDC Data Portal or by querying the GDC API. For TCGA, follow-up data is available for specific TCGA studies and made available for download in associated clinical supplement files (i.e. clinical XML, biotabs). Follow-up data can be different for the different TCGA studies.

Why might variants found in TCGA-generated MAFs be missing from the GDC open access MAF files?

Some of the reasons particular mutations may have been removed include updates to third party databases, more conservative germline-masking rules by the GDC, and different mutation calling pipelines and versions. Despite these differences, the GDC recaptures over 97% of TCGA-validated variants in the controlled-access MAF files. The GDC suggests using controlled-access MAF files if important variants cannot be found in somatic MAF files.

Where can I find hg18 and hg19 GAF files for legacy TCGA data?

These files are available on the [GDC Reference Files page](/about-data/data-harmonization-and-generation/gdc-reference-files).

Pages1 2 next › last »


【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有